An encoder-decoder based generation model for online handwritten mathematical expressions
نویسندگان
چکیده
目的 在线公式识别是一种将在线输入手写轨迹点序列转换为公式文本的任务,其广泛应用在手机、平板等便携式设备上。众所周知,训练数据对于神经网络十分重要,但获取有标注的在线公式数据所需要的成本十分昂贵,在训练数据不足的情况下,深度神经网络在该任务上的泛化性和鲁棒性会受到影响。为此,提出了一个基于编码—解码模型的在线数据生成模型。方法 该模型从给定的公式文本生成对应的在线轨迹点序列,从而灵活地扩充训练数据规模。生成模型在编码器端设计了结合树形表示的文本特征提取模块,并且引入了基于位置的注意力算法,使模型实现了输入文本序列与输出轨迹序列间的对齐。同时,解码器端融入了不同手写人风格特征,使模型可以生成多种手写人风格的样本。结果 实验中,首先,将本文生成方法在不同类型输入文本和不同手写人风格上的结果可视化,并展示了模型在多数情况下的有效性。其次,生成模型合成的额外数据可作为训练集的增广,该数据被用于训练Transformer-TAP(track,attend,and parse)、TAP和DenseTAP-TD(DenseNet TAP with tree decoder)模型,并分析了3种模型在使用增广数据前后的性能变化。结果表明,引入增广数据分进行训练后,3个模型的绝对识别率分别提升了0.98%、1.55%和1.06%;相对识别率分别提升了9.9%、12.37%和9.81%。结论 本文提出的在线生成模型可以更加灵活地实现对原有数据集的增广,并有效提升了在线识别模型的泛化性能。;Objective The emerging digitization and intelligence techniques have facilitated the path to accept recognize text content originated from paper documents,photos,or contexts nowadasys. Recent online mathematical expression recognition is widely used for such domain of portable devices like mobile phones tablet PCs. are required converting handwritten trajectory into indicate symbols-between logical relationship in relevance power,subscript matrix. Online math calculator can be receive expressions terms recognition,which makes input easier beyond LaTeX symbols complex relation. At same time,instant electric recording scenarios becomes feasible classes academic meetings. Current encoder-decoder based methods been developing intensively. quality quantity training data a great impact on performance deep neural network. lack has threatened optimization generalization robustness model consistency. form scene recognized as track point sequence,which needs collected annotation-before real time handwriting device further. Therefore,cost collection higher than offline data. still poor due insufficient Method To resolve problems mentioned above,we develop an generation expressions. generate corresponding sequence given text. We also synthesize different-writing-style by different style input. A large amount near obtained at very low cost,which expands scale flexibly avoids lacked fitting or over model. For tasks,the ability representation discrimination encoder often affect directly. aims effectively. In detail,sufficient difference needed between representations inputs,and certain similarity ones similar inputs well. Intuitively,the structure well reflect expressions-between similarities differences some extent. Therefore,we design representation-based feature extraction module encoder,which full use two-dimensional information. addition,there no each character output points. Therefore,to align points,we introduce location-based attention decoder. Simutaneously,to multiple samples,we integrate features decoder skeleton through text,and writing feature-related rendered styles. Result method proposed evaluated two aspects:visual effect generated results improvement tasks. First,we illustrate difficulty,including simple sequence,complex fraction,multi-line long Second, we select display Next,we number texts randomly Finally,we these synthetic augmentation train Transformer-TAP (track,attend,and parse),TAP Densetap-TD (DenseNet decoder)as three models significantly improved beneficial additional enriches set mutual-benefited more symbol combinations show that absolute rates increased 0. 98%, 1. 55% 06%,as relative 9. 9%,12. 37% 81%. Conclusion An introduced realize on-line It expand original Experimental result demonstrates improve accuracy improves
منابع مشابه
A Structural Analysis Approach for Online Handwritten Mathematical Expressions
This paper proposes a structural analysis approach for mathematical expressions based on the Attribute String Grammar and the Baseline Tree Transformation approaches. The approach consists of geometrical feature extraction, parsing structure and expression analysis steps. The algorithm for structure parsing uses baselines, which are represented by geometrical features to recursively decompose t...
متن کاملLstm Encoder–decoder for Dialogue Response Generation
This paper presents a dialogue response generator based on long short term memory (LSTM) neural networks for the SLG (Spoken Language Generation) pilot task of DSTC5 [1]. We first encode the input containing different number of semantic units as fixed-length semantic vector with a LSTM encoder. Then we decode the semantic vector with a variant of LSTM and generate corresponding text. In order t...
متن کاملOnline symbol segmentation and recognition in handwritten mathematical expressions
This paper is concerned with the symbol segmentation and recognition task in the context of on-line sampled handwritten mathematical expressions, the first processing stage of an overall system for understanding arithmetic formulas. Within our system a statistical approach is used tolerating ambiguities within the decision stages and resolving them either automatically by additional knowledge a...
متن کاملOn Machine Understanding of Online Handwritten Mathematical Expressions
This paper aims at automatic recognition of online handwritten mathematical expressions written on an electronic tablet. The proposed technique involves two major stages: symbol recognition and structural analysis. A multiple-classifier consists of both parametric and nonparametric classifier has been used for recognition of symbols. Parametric classifier is based on Hidden Markov Model (HMM), ...
متن کاملOnline Handwritten Mathematical Expressions Recognition System Using Fuzzy Neural Network
The paper is devoted to the development of the new online handwritten mathematical expressions recognition system. The paper presents the recognition method to the handwritten symbols using fussy neural network NEFCLASS as a means for classification.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Image and Graphics
سال: 2023
ISSN: ['1006-8961']
DOI: https://doi.org/10.11834/jig.220894